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We study nonparametric maximum likelihood estimation of a log-concave probability density 
and its distribution and hazard function. Some general properties of these estimators are derived 
from two characterizations. It is shown that the rate of convergence with respect to supremum 
norm on a compact interval for the density and hazard rate estimator is at least (log(n)/n)^''^ and 
typically (log(n)/n)'^''^, whereas the difference between the empirical and estimated distribution 
function vanishes with rate Op(n~^''^) under certain regularity assumptions. 
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1. Introduction 

Two common approaches to nonparametric density estimation are smoothing methods 
and qualitative constraints. The former approach includes, among others, kernel density 
estimators, estimators based on discrete wavelets or other series expansions and estima- 
tors based on roughness penalization. Good starting points for the vast literature in this 
field are Silverman (1982, 1986) and Donoho et al. (1996). A common feature of all of 
these methods is that they involve certain tuning parameters, for example, the order of 
a kernel and the bandwidth. A proper choice of these parameters is far from trivial since 
optimal values depend on unknown properties of the underlying density /. The second 
approach avoids such problems by imposing qualitative properties on /, for example, 
monotonicity or convexity on certain intervals in the univariate case. Such assumptions 
are often plausible or even justified rigorously in specific applications. 
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Density estimation under shape constraints was first considered by Grenander (1956), 
who found that the nonparametric maximum hkehhood estimator (NPMLE) /™°" of a 
non-increasing density function / on [0, oo) is given by the left derivative of the least 
concave majorant of the empirical cumulative distribution function on [0,oo). This work 
was continued by Rao (1969) and Groeneboom (1985, 1988), who established asymptotic 
distribution theory for n^^^{f — fn°^){^) ^ fixed point i > under certain regular- 
ity conditions and analyzed the non-Gaussian limit distribution. For various estimation 
problems involving monotone functions, the typical rate of convergence is Op(n~^/^) 
pointwise. The rate of convergence with respect to supremum norm is further deceler- 
ated by a factor of log(n)^/'^ (Jonker and van der Vaart (2001)). For applications of 
monotone density estimation, consult, for example. Barlow et al. (1972) or Robertson et 
al. (1988). 

Monotone estimation can be extended to cover unimodal densities. Remember that a 
density / on the real line is unimodal if there exists a number M = M{f) such that / is 
non-decreasing on (— oo,M] and non-increasing on [M, oo). If the true mode is known a 
priori, unimodal density estimation boils down to monotone estimation in a straightfor- 
ward manner, but the situation is different if M is unknown. In that case, the likelihood 
is unbounded, problems being caused by observations too close to a hypothetical mode. 
Even if the mode was known, the density estimator is inconsistent at the mode, a phe- 
nomenon called "spiking" . Several methods were proposed to remedy this problem (see 
Wegman (1970), Woodroofe and Sun (1993), Meyer and Woodroofe (2004) or Kulikov 
and Lopuhaa (2006)), but all of them require additional constraints on /. 

The combination of shape constraints and smoothing was assessed by Eggermont and 
La-Riccia (2000). To improve the slow rate of convergence of n~^^^ in the space Li(M) 
for arbitrary unimodal densities, they derived a Grenander-type estimator by taking 
the derivative of the least concave majorant of an integrated kernel density estimator 
rather than the empirical distribution function directly, yielding a rate of convergence of 
Op(n-2/5). 

Estimation of a convex decreasing density on [0,oo) was pioneered by Anevski (1994, 
2003). The problem arose in a study of migrating birds discussed by Hampel (1987). 
Groeneboom et al. (2001) provide a characterization of the estimator, as well as con- 
sistency and limiting behavior at a fixed point of positive curvature of the function to 
be estimated. They found that the estimator must be piecewise linear with knots be- 
tween the observation points. Under the additional assumption that the true density 
/ is twice continuously differentiable on [0,oo), they show that the MLE converges at 
rate Op(n~^/^) pointwise, somewhat better than in the monotone case. Monotonicity 
and convexity constraints on densities on [0, oo) have been embedded into the general 
framework of fc-monotone densities by Balabdaoui and Wellner (2008). In a technical 
report, we provide a more thorough discussion of the similarities and differences between 
fc-monotone density estimation and the present work (Diimbgen and Rufibach (2008)). 

In the present paper, we impose an alternative, and quite natural, shape constraint on 
the density /, namely, log-concavity. That means 



f{x) = exp<^(a;) 
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for some concave function (p:IR^ [— (X),oo). This class is rather flexible, in that it gen- 
eralizes many common parametric densities. These include all non-degenerate normal 
densities, all Gamma densities with shape parameter > f , all Weibull densities with 
exponent > 1 and all beta densities with parameters > 1. Further examples are the 
logistic and Gumbel densities. Log-concave densities are of interest in econometrics; 
see Bagnoli and Bergstrom (2005) for a summary and further examples. Barlow and 
Proschan (1975) describe advantageous properties of log-concave densities in reliability 
theory, while Chang and Walthcr (2007) use log-concave densities as an ingredient in 
nonparametric mixture models. In nonparametric Bayesian analysis, too, log-concavity 
is of certain relevance (Brooks (1998)). 

Note that log-concavity of a density implies that it is also unimodal. It will turn out 
that by imposing log-concavity, one circumvents the spiking problem mentioned before, 
which yields a new approach to estimating a unimodal, possibly skewed density. Moreover, 
the log-concave density estimator is fully automatic, in the sense that there is no need 
to select any bandwidth, kernel function or other tuning parameters. Finally, simulating 
data from the estimated density is rather easy. All of these properties make the new 
estimator appealing for use in statistical applications. 

Little large sample theory is available for log-concave estimators thus far. Scngupta 
and Paul (2005) considered testing for log-concavity of distribution functions on a com- 
pact interval. Walthcr (2002) introduced an extension of log-concavity in the context 
of certain mixture models, but his theory does not cover asymptotic properties of the 
density estimators themselves. Pal et al. (2006) proved the log-concave NPMLE to be 
consistent, but without rates of convergence. 

Concerning the computation of the log-concave NPMLE, Walthcr (2002) and Pal et al. 
(2006) used a crude version of the iterative convex minorant (ICM) algorithm. A detailed 
description and comparison of several algorithms can be found in Rufibach (2007), while 
Diimbgen et al. (2007a) describe an active set algorithm, which is similar to the vertex 
reduction algorithms presented by Groeneboom et al. (2008) and seems to be the most 
efficient one at present. The ICM and active set algorithms are implemented within 
the R package "logcondens" by Rufibach and Diimbgen (2006), accessible via "CRAN". 
Corresponding MATLAB code is available from the first author's homepage. 

In Section 2, we introduce the log-concave maximum likelihood density estimator, dis- 
cuss its basic properties and derive two characterizations. In Section 3, we illustrate this 
estimator with a real data example and explain briefly how to simulate data from the 
estimated density. Consistency of this density estimator and the corresponding estima- 
tor of the distribution function are treated in Section 4. It is shown that the suprcmum 
norm between estimated density, /„ , and true density on compact subsets of the interior 
of {/ > 0} converges to zero at rate Op((log(n)/n)'''), with 7 e [1/3,2/5] depending on 
/'s smoothness. In particular, our estimator adapts to the unknown smoothness of /. 
Consistency of the density estimator entails consistency of the distribution function es- 
timator. In fact, under additional regularity conditions on /, the difference between the 
empirical c.d.f. and the estimated c.d.f. is of order Op{n~^/^) on compact subsets of the 
interior of {/ > 0}. 
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As a by-product of our estimator, note the following. Log-concavity of the density 
function / also implies that the corresponding hazard function h = f /{I — F) is non- 
decreasing (cf. Barlow and Proschan (1975)). Hence, our estimators of / and its c.d.f. 
F entail a consistent and non-decreasing estimator of h, as pointed out at the end of 
Section 4. 

Some auxiliary results, proofs and technical arguments are deferred to the Appendix. 



2. The estimators and their basic properties 

Let X be a random variable with distribution function F and Lcbesgue density 

f{x) =cxp(p{x) 

for some concave function (p:M^ [— oo,cx)). Our goal is to estimate / based on a ran- 
dom sample of size n > 1 from F. Let Xi < X2 < ■ ■ ■ < Xn be the corresponding order 
statistics. For any log-concave probability density / on M, the normalized log-likelihood 
function at / is given by 

'log/dF,,- fipdWn, (1) 



where F„ stands for the empirical distribution function of the sample. In order to relax 
the constraint of / being a probability density and to get a criterion function to maximize 
over the convex set of all concave functions (p, we employ the standard trick of adding a 
Lagrange term to (1), leading to the functional 



^'„((p) := J ipd¥„- J eyipLp{x)dx 



(see Silverman (1982), Theorem 3.1). The nonparamctric maximum likelihood estimator 
oi (p ^ log/ is the maximizer of this functional over all concave functions. 



(fn ■■= argmaxvl'„((p) 

<p concave 



and /„ := exp(^„ 



Existence, uniqueness and shape of ip„. One can easily show that '^„{ip) > —00 if and 
only if ip is real- valued on [Xi , Xn] . The following theorem was proven independently by 
Pal et al. (2006) and Rufibach (2006). It also follows from more general considerations 
in Diimbgen et al. (2007a). Section 2. 

Theorem 2.1. The NPMLE (pn exists and is unique. It is linear on all intervals 
[Xj,Xj^i], I < j <n. Moreover, ipn = —00 on K \ [Xi, X„] . 
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Characterizations and further properties. We provide two characterizations of the es- 
timators ipn, fn and the corresponding distribution function F„, that is, Fn{x) = 
J^^fn{r)dr. The first characterization is in terms of (pn and perturbation functions. 

Theorem 2.2. Let ip be a concave Junction such that {x : p{x) > —00} = [Xi, X„] . Then, 
(p = (fin if and only if 

J A{x)dF„{x) < J A{x)cxp!p{x)dx (2) 

for any A : M ^ R such that ip + A A is concave for some A > 0. 

Plugging suitable perturbation functions A in Theorem 2.2 yields valuable information 
about ifn and Fn- For a first illustration, let /i(G) and Var(G) be the mean and variance, 
respectively, of a distribution (function) G on the real line with finite second moment. 
Setting A(a;) :— zLx or A(a:;) := —x'^ in Theorem 2.4 yields the following. 

Corollary 2.3. 

/i(F„) = MIPn) and Var(F„) <Var(F„). 

Our second characterization is in terms of the empirical distribution function F„ and 
the estimated distribution function Fn. For a continuous and piecewise linear function 
h : [Xi,Xn] K, we define the set of its "knots" to be 

Sn{h) ■.= {te{XuXn):h'it-)^h'{t+)}U{XuXn}. 
Recall that tp„ is an example of such a function h with 5„((^„) C {Xi, X2, . . . , Xn}. 

Theorem 2.4. Let ip be a concave function which is linear on all intervals [Xj , Xj+i], 
l<j<n, while if = —00 ott. IR \ [Xi,X„]. Defining F{x) := J^^expip{r) dr, we assume 
further that F{Xn) = 1. Then, (p — Lp^ and F = F^ if, and only if for arbitrary t € 

[Xi , Xn] , 

[ F{r)dr< f F„(r)dr (3) 
Jxi Jxi 

with equality in the case of t £ iS„(^). 

A particular consequence of Theorem 2.4 is that the distribution function estimator 
Fn is very close to the empirical distribution function F„ on iS„((^„). 



Corollary 2.5. 



F„ - n-^ <Fn<¥ 

n 

on Sniipn)- 
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-1 1 2 3 

Figure 1. Distribution functions and tiie process D{t) for a Gumbel sample. 



Figure 1 illustrates Theorem 2.4 and Corollary 2.5. The upper plot displays F„ and 
Fn for a sample of rt = 25 random numbers generated from a Gumbel distribution with 
density f{x) = e~^ exp(— e~^) on E. The dotted vertical lines indicate the "kinks" of (p„, 
that is, all t S Sn{^n)- Note that F„ and F„ are indeed very close on the latter set, with 
equality at the right end-point X„. The lower plot shows the process 

D{t):^ f (4-F„)(r)d7- 

for [Xi,Xn]. As predicted by Theorem 2.4, this process is non-positive and equals 
zero on 5„((^„). 

3. A data example 

In a recent consulting case, a company asked for Monte Carlo experiments to predict 
the reliability of a certain device they produce. The reliability depended in a certain 
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deterministic way on five different and independent random input parameters. For each 
input parameter, a sample was available and the goal was to fit a suitable distribution 
to simulate from. Here, we focus on just one of these input parameters. 

At first, we considered two standard approaches to estimate the unknown density /, 
namely, (i) fitting a Gaussian density /par with mean /i(F„) and variance := n{n — 
l)~^Var(F„); (ii) the kernel density estimator 



fkcrix) :=y (/)^/y5^(a:-2/)dF„(y), 

where (pa denotes the density of A/'(0, cr^). This very small bandwidth a/^/n was chosen 
to obtain a density with variance and to avoid putting too much weight into the tails. 

Looking at the data, approach (i) is clearly inappropriate because our sample of size 
n = 787 revealed a skewed and significantly non-Gaussian distribution. This can be seen 
in Figure 2, where the multimodal curve corresponds to /ker, while the dashed line depicts 
/par- Approach (ii) yielded Monte Carlo results agreeing well with measured reliabilities, 
but the engineers questioned the multimodality of /kor- Choosing a kernel estimator with 
larger bandwidth would overestimate the variance and put too much weight into the tails. 
Thus, we agreed on a third approach and estimated / by a slightly smoothed version of 

frn 



with 7^ := (7^ — Var(F„), so that the variance of /* coincides with a^. Since log-concavity 
is preserved under convolution (cf. Prekopa (1971)), /* is also log-concave. For the explicit 
computation of Var(F„), see Diimbgen et al. (2007a). By smoothing, we also avoid the 
small discontinuities of fn at Xi and X„. This density estimator is the skewed unimodal 
curve in Figure 2. It also yielded convincing results in the Monte Carlo simulations. 

Note that both estimators /„ and /* arc fully automatic. Moreover, it is very easy to 
sample from these densities: let 5„(i^„) consist oi xo < xi < ■ ■ ■ < Xm, and consider the 
data Xi temporarily as fixed. Now, 

(a) generate a random index J e {1, 2, . . . , m} with P( J = j) = iVi(.Tj) — F„(a;j„i); 

(b) generate 

x:=.._,+(..-.,_,).{;;^s(i+(^"-i)^)/®' ^itl 

where Q := (pn{xj) — (pn{xj-i) and U ~ Unif[0, 1]; 

(c) generate 

X*:^X + ^Z with Z-7V(0,1), 
where J, U and Z are independent. Then, X ^ fn and X* ^ f*. 




1400 1450 1500 1550 1600 1650 1700 1750 1800 1850 



Figure 2. Three competing density estimators. 



4. Uniform consistency 

Let us introduce some notation. For any integer n > 1, we define 

Pn ■■= log(n)/n 

and the uniform norm of a function g : / ^ M on an interval / C M is denoted by 

hWL :=sup|5(x)|. 

We say that g belongs to the Holder class T-l^'^{I) with exponent /? G [1,2] and constant 
i > if for all x,y E I, we have 

\g{x)^g{y)\<L\x^y\, if/3=l, 
W{x)-g'{y)\<L\x-yf-\ if/3>l. 
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Uniform consistency of (f n. Our main result is the following theorem. 

Theorem 4.1. Assume for the log-density ip ~ log/ that ip £ Ti^'^iT) for some exponent 
(3 € [1,2], some constant L> and a subinterval T = [A^B] of the interior of {/ > 0}. 
Then, 

max(^„-^)(0=Op(p^^/(^^+^)), 
^max^^(^-^„)(i)=Op(p^/(2/3+i)), 

where Tin^P) := [A + pl/^'^+'\B - p'J^'^^^ 

Note that the previous result remains true when we replace (p„ — ip with fn ^ f ■ It 
is well known that the rates of convergence in Theorem 4.1 are optimal, even if (3 was 
known (cf. Khas'minskii (1978)). Thus, our estimators adapt to the unknown smoothness 
of / in the range /3 S [1,2]. 

Also, note that concavity of Lp implies that it is Lipschitz-continuous, that is, belongs to 
rO-'^{T) for some L > on any interval T = [A, B] with A > inf{/ > 0} and B < sup{/ > 
0}. Hence, one can easily deduce from Theorem 4.1 that /„ is consistent in Li(R) and 
that F„ is uniformly consistent. 

Corollary 4.2. 

J\Ux)-fix)\dx^pO and ||F„ - 0. 

Distance of two consecutive knots and uniform consistency of Fn ■ By means of Theorem 
4.1, we can solve a "gap problem" for log-concave density estimation. The term "gap 
problem" was first used by Balabdaoui and WcUncr (2008) to describe the problem of 
computing the distance between two consecutive knots of certain estimators. 

Theorem 4.3. Suppose that the assumptions of Theorem ^.1 hold. Assume, further, that 
ip'{x) — ip'iy) > C{y — x) for some constant C > and arbitrary A < x < y < B , where 
p' stands for p' {■ —) or p' {■-{-). Then, 

sup min |x-?;| = Op(p^/(4''+")). 

Theorems 4.1 and 4.3, combined with a result of Stute (1982) about the modulus of 
continuity of empirical processes, yield a rate of convergence for the maximal difference 
between Fn and F„ on compact intervals. 

Theorem 4.4. Under the assumptions of Theorem 4-3, 

inax \Fnit) ¥n{t)\ = Op(pf /(4'^+2)). 

teT(n,/3 
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In particular, if (3 > \ , then 



max \Fn{t)~¥n{t)\^Op{n-^'^). 



Thus, under certain regularity conditions, the estimators Fn and F„ are asymptoti- 
cally equivalent on compact sets. Conclusions of this type arc known for the Grenander 
estimator (cf. Kicfcr and Wolfowitz (1976)) and the least squares estimator of a convex 
density on [0,cx)) (cf. Balabdaoui and WcUner (2007)). 

The result of Theorem 4.4 is also related to recent results of Gine and Nickl (2007, 
2008). In the latter paper, they devise kernel density estimators with data-driven band- 
widths which are also adaptive with respect to (3 in a certain range, while the integrated 
density estimator is asymptotically equivalent to F„ on the whole real line. However, if 
P > 3/2, they must use kernel functions of higher order, that is, no longer non-negative, 
and simulating data from the resulting estimated density is not straightforward. 

Example. Let us illustrate Theorems 4.1 and 4.4 with simulated data, again from the 
Gumbcl distribution with (p{x) ~ —x — e~^. Here, (p"{x) = — e~^, so the assumptions of 
our theorems arc satisfied with /3 = 2 for any compact interval T. The upper panels of 
Figure 3 show the true log-density (p (dashed line) and the estimator ipn (line) for samples 
of sizes n = 200 (left) and n = 2000 (right). The lower panels show the corresponding 
empirical processes n^/^(F„ — F) (jagged curves) and ri^/^(F„ — F) (smooth curves). 
First, the quality of the estimator ipn is quite good, even in the tails, and the quality 
increases with sample size, as expected. Looking at the empirical processes, the similarity 
between n^^^{¥n-F) a.ndn^^^{Fr,~ F) increases with sample size, too, but rather slowly. 
Also, note that the estimator Fn outperforms F„ in terms of suprcmum distance from 
_F, which leads us to the next paragraph. 

Marshall's lemma. In all simulations we looked at, the estimator F^ satisfied the in- 
equality 

<||F„-^^||«, (4) 

provided that / is indeed log-concave. Figure 3 shows two numerical examples of this 
phenomenon. In view of such examples and Marshall's (1970) lemma about the Grenander 
estimator ^7"°", we first tried to verify that (4) is correct almost surely and for any 
n > 1. However, one can construct counterexamples showing that (4) may be violated, 
even if the right-hand side is multiplied with any fixed constant C > 1. Nevertheless, our 
first attempts resulted in a version of Marshall's lemma for convex density estimation; 
see Diimbgen et al. (2007). For the present setting, we conjecture that (4) is true with 
asymptotic probability one as n— > oo, that is, 

P(!li^„-F|l« <||F„-F||«)^1. 

A monotone hazard rate estimator. Estimation of a monotone hazard rate is described, 
for instance, in the book by Robertson et al. (1988). They directly solve an isotonic 



50 



L. Dumbgen and K. Rufibach 




S } i f t 



1 s 5 4 i » 7 



Figure 3. Density functions and empirical processes for Gumbel samples of size n = 200 and 
n = 2000. 

estimation problem similar to that for the Grenander density estimator. For this set- 
ting, Hall et al. (2001) and Hall and van Keilegom (2005) consider methods based upon 
suitable modifications of kernel estimators. Alternatively, in our setting, it follows from 
Lemma A. 2 in Section 5 that 



fnix) 



defines a simple plug-in estimator of the hazard rate on (— cx),A"„) which is also non- 
decreasing. By virtue of Theorem 4.1 and Corollary 4.2, it is uniformly consistent on any 
compact subinterval of the interior of {/ > 0}. Theorems 4.1 and 4.4 even entail a rate 
of convergence, as follows. 



Corollary 4.5. Under the assumptions of Theorem 4-3, 

m^x Kit)-h{t)\=OM^^'''^'^). 



Estimating log-concave densities 



51 



5. Outlook 

Starting from the results presented here. Balabdaoui et al. (2008) recently derived the 
pointwise limiting distribution of /„. They also considered the limiting distribution of 
argmax^gjj/„(a;) as an estimator of the mode of /. Empirical findings of Miiller and 
Rufibach (2008) show that the estimator /„ is even useful for extreme value statistics. 
Log-concave densities also have potential as building blocks in more complex models 
(e.g., regression or classification) or when handling censored data (cf. Diimbgen et al. 
(2007a)). 

Unfortunately, our proofs work only for fixed compact intervals, whereas simulations 
suggest that the estimators perform well on the whole real line. Presently, the authors 
are working on a different approach, where (pn is represented locally as a parametric 
maximum likelihood estimator of a log-linear density. Presumably, this will deepen our 
understanding of the log-concave NPMLE's consistency properties, particularly in the 
tails. For instance, we conjecture that F„ and Fn are asymptotically equivalent on any 
interval T on which ip' is strictly decreasing. 



Appendix: Auxiliary results and proofs 
A.l. Two facts about log-concave densities 

The following two results about a log-concave density / ~ exp and its distribution 
function F are of independent interest. The first result entails that the density / has at 
least subexponential tails. 

Lemma A.l. For arbitrary points Xi <X2, 

Fix2)-F{xi) 



^f{xi)f{x2) < 



X2 - Xi 

Moreover, for Xo € {f > 0} and any real x ^ Xo, 

h{xo,x) ^ ^ 



i^<l \f{xo)\x-Xo\ 

f{Xo)~] A f{Xo)\x~Xo\. 1^,, , 

' exp 1 — — tf f{xo)\x-Xo\>h{xo,x), 

\ ri[Xo,x) 



where 



h{xo,x) ■.=^ F{max{xo,x)) - F{mm{xo,x)) < < if x<Xo, 

yi — t{Xo), lJX>Xo- 

A second well-known result (Barlow and Proschan (1975), Lemma 5.8) provides further 
connections between the density / and the distribution function F . In particular, it entails 
that f /{F{1 — F)) is bounded away from zero on {x : < F{x) < 1}. 
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Lemma A. 2. The function f /F is non-increasing on {.t : < F{x) < 1} and the function 
//(I — F) is non- decreasing on {x:0 < F{x) < 1}. 

Proof of Lemma A.l. To prove the first inequality, it suffices to consider the non-trivial 
case of xi,X2 G {/ > 0}. Concavity of ip then entails that 

1^X2 ft t \ 

F{x2)-F{xi)> / exp v[xi)+ v{x2)]At 

J XI \X2-Xi X2-X1 J 

= (.T2-xi) / e^p{{l-u)ip{xi) +uip{x2))du 



Jo 

> {x2 - xi)exp^y ((1 - u)cp{xi) + uip{x2)) du 
= {X2 - Xi) c^p{ip{xi) /2 + (p{x2) /2) 



= {X2 - Xi)^y f{xi)f{x2), 

where the second inequality follows from Jensen's inequality. 

We prove the second asserted inequality only for x > Xo, that is, h{xo,x) = F{x) 
F{xo), the other case being handled analogously. The first part entails that 

f{x) ^ f h{xo,x) 



fi^o) \f{xo){x - Xo) J 

and the right-hand side is not greater than one if f{xo){x — Xo) > h(xo,x). In the latter 
case, recall that 

h{xo,x) >{x- Xo) / cxp((l - u)(p{xo) + u(p{x)) du ^ f{xo){x ~ Xo)J{(p{x) - (p{xo)) 

JO 

with (p{x) — fixo) < 0, where J{y) '■= cxp{uy) du. Elementary calculations show that 
J(-r) = (l-e"'')/r> l/(l-hr) for arbitrary r > 0. Thus, 

f{Xo){x-Xo) 

n(Xo, x) > 



1 + ip(Xo) - ip{x) ' 

which is equivalent to f{x)/f{xo) < exp(l — f{xo){x ~ Xo)/h{xo,x)). □ 



A. 2. Proofs of the characterizations 



Proof of Theorem 2.2. In view of Theorem 2.1, we may restrict our attention to 
concave and real- valued functions (p on [Xi,X„] and set ip —00 on M \ [Xi,X„]. The 
set Cn of all such functions is a convex cone and for any function A :]R ^ M and t > Q, 
concavity of ip + tA on M is equivalent to its concavity on [Xi , Xn] ■ 
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One can easily verify that is a concave and real- valued functional on C„ . Hence, as 
well known from convex analysis, a function (p € Cn maximizes if and only if 

lim ^"^'^ ^ ~ ~ ^"^'^^ < 

no i ~ 

for all (y5 G C„. But, this is equivalent to the requirement that 

for any function A : R ^ K such that ip + A A is concave for some A > 0. The assertion of 
the theorem now follows from 



hm = I A dF„ - I A(a:) exp ^(x) dx. 



□ 



Proof of Theorem 2.4. We start with a general observation. Let G be some distribution 
(function) with support [Xi , X„] and let A : [Xi , Xn] — * M be absolutely continuous with 
Li-derivative A'. It then follows from Fubini's theorem that 



J AdG = A{X„)- j " A'{r)G{r)dr. (A.l) 

Now, suppose that ip ~ ipn and let tG {Xi,Xn\. Let A be absolutely continuous on 
[Xi,Xn] with ii-dcrivative A'(r) — l{r < t} and arbitrary value of A(X„). Clearly, 
^ + A is concave, whence (2) and (A.l) entail that 



A(X„)- / F„(r)dr< A(X„)- / F{r)dr, 

JXi JXi 

which is equivalent to inequality (3). In the case of t G 5„((^) \ {^i}, let A'(r) = — l{r < 
t} . Then, ip + AA is concave for some A > so that 



A{Xn)+ [ F„(r)dr< A(X„)+ /" F{r)dr, 
JXi Jxi 

which yields equality in (3). 

Now, suppose that ip satisfies inequality (3) for all t with equality if i G Sni^p)- In view 
of Theorem 2.1 and the proof of Theorem 2.2, it suffices to show that (2) holds for any 
function A defined on [Xi,X„] which is linear on each interval 1 < j < n, 

while (p + A A is concave for some A > 0. The latter requirement is equivalent to A being 
concave between two consecutive knots of (p. Elementary considerations show that the 
Li-derivative of such a function A may be written as 

n 

A'(r)=^/3,l{r<X,}, 

j=2 
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with real numbers (32, ■ ■ ■ ,(3n such that 



/3,>0 



Consequently, it follows from (A.l) and our assumptions on ip that 




□ 



Proof of Corollary 2.5. For t e Sn{(pn) and s < t < u, it follows from Theorem 2.4 
that 



A. 3. Proof of tori's consistency 

Our proof of Theorem 4.1 involves a refinement and modification of methods introduced 
by Diimbgen et al. (2004). A first key ingredient is an inequality for concave functions 
due to Diimbgen (1998) (see also Diimbgen et al. (2004) or Rufibach (2006)). 

Lemma A. 3. For any /3 G [1,2] and L > 0, there exists a constant K = K(J3,L) g (0, 1] 
with the following property. Suppose that g and g are concave and real-valued functions 
on a compact interval T ~ [A, B], where g G 7i^'^{T). Let e > and Q < 5 < i^T min{i? — 




and 




Letting u [t and s] t yields 



Fn{t) < Mt) and Fn{t) > ¥„{t-) = F„(i) - n'K 



□ 



A,ei//3}, Then 



sup(g - g)>€ or 



telA+S,B-S] 



sup {g-g)>€ 



implies that 




inf {g-g){t)>e/4 

[c,c+5] 



inf (5-5)(i)>e/4 

[cc+S] 



for some c € [A, B — S]. 
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Starting from this lemma, let us first sketch the idea of our proof of Theorem 4.1. 
Suppose we had a family V of measurable functions A with finite seminorm 



a(A):= (^J A'^dF 



1/2 



such that 

|/Ad(F„-f)| 
^^P — - ^ (^-2) 

Aev cr{A)p„' 

with asymptotic probability one, where C > is some constant. If, in addition, ip^ipn (iT) 
and — <^n < C with asymptotic probability one. then we could conclude that 

((p-(^„)d(F„-F) <Ca{ip^ip„)pl/^, 

while Theorem 2.2, applied to A -.^ ip — ipn, entails that 

{if - 0n) d(F„ -F)< [{if- ifin) diF - F) 



^- J A(l-exp(-A))di^ 
< + j A^ dF 



because y{l — exp(— y)) > (1 + y+)^^y'^ for all real y, where :~ max(y, 0). Hence, with 
asymptotic probability one, 

a((^-(^„)'<C2(l+C)Vn- 

Now, suppose that I'p — (pnl > on a subinterval of T = [^,-8] of length el/^ , where 
(e„)„ is a fixed sequence of numbers e„ > tending to zero. Then, a{ip — (fin)"^ > 
ei''^+'^/''minT(/), so that 

with (7= (C2(l + C)VminT(/))'^/(2/3+i), 

The previous considerations will be modified in two aspects to get a rigorous proof 

1/2 

of Theorem 4.1. For technical reasons, we must replace the denominator a{A)pn of 
inequality (A. 2) with a{A)pl/'^ + W{A)pn^^ , where 

H^(A):=sup; 1^(^)1 



j;gR max(l, \ip{x)\)' 
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This is necessary to deal with functions A with small values of F{{A ^ 0}). Moreover, 
we shall work with simple "caricatures" oi ip — ipn, namely, functions which are piecewise 
linear with at most three knots. Throughout this section, piecewise linearity does not 
necessarily imply continuity. A function being piecewise linear with at most m knots 
means that the real line may be partitioned into to + 1 non-degenerate intervals on each 
of which the function is linear. Then, the to real boundary points of these intervals are 
the knots. 

The next lemma extends inequality (2) to certain piecewise linear functions. 

Lemma A. 4. Let A : R ^ R be piecewise linear such that each knot q of A satisfies one 
of the following two properties: 

q (z Sn{ipn) and A((7) = liminf A(a;); (A. 3) 

X — >q 

A(g) = limA(r) and A' (q-) > A\q+). (A.4) 

r — >q 

Then, 

' AdF„ < j AdFn. (A.5) 

We can now specify the "caricatures" mentioned above. 

Lemma A.5. Let T = [A,B] be a fixed subinterval of the interior of {/ > 0}. Let Lp — 
<Pn ^ £ or (fn ^ ip > e on some interval [c, c + 5] d T with length S > and suppose that 
Xi < c and Xn > c + S. There then exists a piecewise linear function A with at most 
three knots, each of which satisfies condition ( A. 3 ) or ( A.4 ), and a positive constant 
K' ^K'{f,T) such that 

|^-<^„| >e|A|, (A.6) 
A(^-^„)>0, (A.7) 
A<1, (A.8) 

c+5 

A^{x)dx>5/'i, (A.9) 

W[A) < K'S-^/^a{A). (A.IO) 

Our last ingredient is a surrogate for (A. 2). 

Lemma A.6. Let Dm be the family of all piecewise linear functions on R with at most 
m knots. There exists a constant K" ~ K"{f) such that 

3„p Mi^ZL^^,,,, 

m>i,Aei5„ cT{A)m^/'^pr/ + W{A)mp;/^ 
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Before we verify all of these auxiliary results, let us proceed with the main proof. 
Proof of Theorem 4.1. Suppose that 

sup((^„ - ip){t) > Ce„ 
teT 

or 

sup - iPn){t) > Cen 

telA+S„,B-S„] 

for some constant C > 0, where e„ :— pn^'"'^^^^^ and (5„ := p]-/'"'^^^^^ = e]-!^ . It follows from 
Lemma A. 3 with e := Ce„ that in the case of C > and for sufficiently large n, there 
is a (random) interval [c„,c„ + (5„] C T on which either — > (C/4)e„ or <p — > 
(C/4)en. But, then, there is a (random) function A„ G I?3 fulfiUing the conditions stated 
in Lemma A. 5. For this A„, it follows from (A. 5) that 

/A„d(F-F„)> f A„diF-Fn)^ / A„(l-exp[-(^-(^„)])d^^. (A.ll) 

JR JR JR 

With A„ := (C/4)e„A„, it follows from (A.6-A.7) that the right-hand side of (A.ll) is 
not smaller than 

{4/C)e-' f A„(l-exp(-AJ)df > Z^/^^;-' a(A„)^ ^ M%a(A„)^ 
J l + (G/4)e„ 1 + 0(1) 

because A„ < (C/4)e„, by (A. 8). On the other hand, according to Lemma A. 6, we may 
assume that 

/ A„ d(i^ - F„) < i^"(3i/V(A„)py2 + 3^(A„)p2/3) 

< A-"(3i/Vy2 + 3A-<5;:i/2p2/3)^(A„) (by (A.10)) 

< A'"(3l/2py2 + 3i^'p2/3-l/(4/3+2))^(^^) 

< Gpy2^(A„) 

for some constant G = G'(/3, L, /, T) because 2/3 - 1/(4/3 + 2) > 2/3 - 1/6 = 1/2. Conse- 
quently, 

2 < 16G^(l + o(l))e-Vn ^ 16G^(l+o(l)) ^ 48G^(l + o(l)) 
a(A„)2 <5;7V(A„)2 - minT(/) ' 

where the last inequality follows from (A. 9). □ 
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Proof of Lemma A. 4. There is a sequence of continuous, piecewise linear functions 
A^; converging pointwise isotonically to A as fc ^ oo such that any knot q of A^ either 
belongs to iS„((^„) or A'j,(g— ) > A'f.{q+). Thus, tpn + AA^ is concave for sufficiently small 
A > 0. Consequently, since Ai < A^ < A for all fc, it follows from dominated convergence 
and (2) that 



AdF„= lim / AfedF„< lim / A^ dK = / AdF„. p 

Proof of Lemma A. 5. The crucial point in all the cases we must distinguish is to 
construct a A e P3 satisfying the assumptions of Lemma A. 4 and (A.6~A.9). Recall that 
ifn is piecewise linear. 

Case la: <^n — > e on [c, c + (5] and n {c,c + 5) ^ 0. Here, we choose a 

continuous function A G with knots c, c + 5 and Xq € 5„((^„)n (c,c + (5), where A := 
on (— cx),c] U [c + (5, 00) and A{xo) '■= — 1- Here, the assumptions of Lemma A. 4 and 
requirements (A.6-A.9) are easily verified. 

Case lb: — (^9 > e on [c, c+(5] and 5„((^„)n (c, c+5) = 0. Let [co, do] D [c, c+5] be the 
maximal interval on which ip — (£„ is concave. There then exists a linear bmction A such 
that A > (fi—ipn on [co,c?o] and A < — e on [c,c + 5]. Next, let (ci,di) :— {A < 0}n(co,(io). 
We now define AgT>2 via 



A(x) := 



0, if X S (— 00, ci) U (di, cx)), 

A/e, if X e [ci, di]. 



Again, the assumptions of Lemma A. 4 and requirements (A.6-A.9) are easily verified; 
this time, we even know that A < — 1 on [c, c+ 5], whence J^^^ A{x)'^ dx > S. Figure 4 
illustrates this construction. 

Case 2: if — ifn > e on [c, c + 5]. Let [c^, c] and [c + 5, dg] be maximal intervals on which 
ifn is linear. We then define 

{0, if X e (—00, Co) U (do, 00), 

l + (3i{x~Xo), iixe[co,Xo], 
l + P2{x~Xo), iixe[xo,do], 

where Xo : — c + 6/2 and /3i > is chosen such that either 

A(co)=0 and {f - 'fin){co) >0, or 
{ip~ (pn)ico) <0 and sign(A) = sign((p - (^„) on [co,Xo]. 

Analogously, /32 < is chosen such that 

A(do) = and {cp - (pn){do) > 0, or 
(if - (p„){do) <0 and sign(A) = sign((p - (^„) on [cco, do]- 



Again, the assumptions of Lemma A. 4 and requirements (A.6-A.9) are easily verified. 
Figure 5 depicts an example. 
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It remains to verify requirement (A. 10) for our particular functions A. Note that 
by our assumption on T = [A, B] , there exist numbers r, Co > such that f > Co on 
To:=[A-t,B + t]. 

In Case la, W{A) < \\A\\l^ = 1, whereas a{Af > Co J^^^ A{x)^ dx = Co5^ /3. Hence, 
(A.IO) is satisfied if K' > 

For Cases lb and 2, we start with a more general consideration. Let h{x) := l{x S 
Q}{a + "fx) for real numbers a, 7 and a non-dcgencratc interval Q containing some point 
in (c, c + S). Let Q ClTo have end-points Xo < Ho- Elementary considerations then reveal 
that 

aihf > Cof\a + ^xfdx >^{yo- Xo){\\hf^?- 

We now deduce an upper bound for W{h)l\\h\\'^. If Q C To or 7 = 0, then W{h)/\\h\\'^ < 
1. Now, suppose that 7 7^ and Q <f_ To. Then, Xo,yo & To satisfy Uo — Xo>t and, without 
loss of generality, let 7 = — 1. Now, 

ll'^llS =niax(|a-Xo|,|a-yo|) 

= (2/0 - Xo)/2 + |a - {xo + VoMA 
>T/2+|a-(a;o + 2/o)/2|. 
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Figure 5. The perturbation function A in Case 2. 



On the other hand, since ^p(x) <ao~ bo\x\ for certain constants Co, 6o > 0, 

W{h)<sn^ J^^^ ^ 

2;gK max(l,6oF| - ao) 

|a| + |a:| 

- — TT~r^\ T 

a:gK max(l,6o|a;| - ao) 
= |a| + (ao + l)/6o 

<\a- {xo + 2/o)/2| + {\A\ + \B\ + r)/2 + (a„ + \)/bo. 

This entails that 

W{h) _ (|A| + |i?|+r)/2 + K + l)/6„ 

In Case lb, our function A is of the same type as h above and ijo — Xo> S. Thus, 
W{A) < aWhW^ < 2C*C-i/2(5-i/V(A). 



Estimating log-concave densities 



61 



In Case 2, A may be written as + /12, with two functions hi and /12 of the same type 
as h above having disjoint support and both satisfying yo — Xo> S/2. Thus, 

W{A) = max(W^(/ii), W{h2)) 

<23/2c,C-i/2(5-i/2niax((T(/ii),(7(/i2)) 

< 23/2c,C-i/2ri/2a(A). □ 

To prove Lemma A. 6, we need a simple exponential inequality. 

Lemma A. 7. Let Y be a random variable such that K{Y) = 0, E(F^) = cr^ and C := 
Eexp(|y|) < 00. Then, for arbitrary t G M, 

CT^f^ C\t\^ 
Eexp(ty)<l + ^ ' 



2 

Proof. 

Eexp(^y) = ki^i^') ^ 1 + ^ + E ^E(|y 

fc=0 ■ fe=3 

For any y > and integers fc > 3, /e^^ < k'^e-''. Thus, E(|y|'^') < Eexp(|y |)fc'=e-'= 
Ck^e~^ . Since k^c~^ < fc!, which can be verified easily via induction on k, 



Lemma A. 7 entails the following result for finite families of functions. 

Lemma A. 8. Let Tin be a finite family of functions h with < W{h) < 00 such that 
^Tin = 0{nP) for some p> 0. Then, for sufficiently large D , 

( |/fed(F„-f)| 
hm P max ^—7- -7- > -D = 0. 

Proof. Since W{ch) — cW{h) and a{ch) = ca{h) for any h € Jin and arbitrary constants 
c > 0, we may assume, without loss of generality, that W{h) = 1 for all h E Hn- Let X 
be a random variable with log-density ip. Since 

limsup^-^ < 

|a:|— ^00 l-^l 

by Lemma A.l, the expectation of exp(toU;(X)) is finite for any fixed to G (0, 1), where 
w{x) := max(l, |(p(x)|). Hence, 

Eexp{to\h(X) - Eh{X)\) < Co := exp{toEw{X))Ecxp{tow{X)) < 00. 
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Lemma A. 7, applied to Y := to(h(X) — E,h(X)), implies that 

EeMmx) - Eh{X))] = E((t/to)r) < 1 + + (^^^c!\t\U 

for arbitrary h € 7i„, t gM. and constants Ci, C'2 depending on to and Co- Consequently, 



Eexp^t J hd{¥„-F)^ =Eexp^(t/n)^ 



{h{X,)-Eh{X)) 



< 1 



(Ecxp((t/n)(/i(X) - Eh{X)))y 



< exp 

It now follows from Markov's inequality that 



2^2 rfi{l-C2\t\/n)+ 
2n n2(l-C2|i|/n)+ 



j /id(F„-F) 



> , ) < 2exp( ^ + ^,(,„^^,^^)^ - ) (A.12) 

1/2 , 2/3^ 



for arbitrary i, 77 > 0. Specifically, let i] = D{a{h)pn + Pn ) and set 

1/2 

a[h)+ pri 

Then, the bound (A.12) is not greater than 

/ a{hf\ogn CiprV^logn 

2exp -7-; 1 -jz — Z^logn 

\2{a{h) + {a{h) + p]iy{l - G2p]h + 

1 , Ci 



< 2 exp . , 

V2 {1~C2P'J')+ 

Consequently, for sufficiently large D > 0, 



D logn 



2cxp((0(l) -L>)logn). 



|//.d(F„-F)| 

max -p: -r^ > D 

"ew. + W(h)pT 

< #nn2 exp((0(l) - D) \ogn) = 0(1) exp((0(l) +p-D) logn) -^0. □ 

Proof of Lemma A. 6. Let H be the family of all functions h of the form 

h{x) = l{x e Q}{c + dx), 
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with any interval Q C M and real constants c, d such that h is non-negative. Suppose that 
there exists a constant C = C{f) such that 



<C\^1. 



(A.13) 



|//id(F„-F)| 
hen a{h)pl/^ + W{h)pT 

For any m G N, an arbitrary function A € may be written as 

M 
i=l 

with il/ = 2m + 2 functions hi CzTi. having pairwisc disjoint supports. Consequently, 

/ M \ 1/2 M 

a(A)= >M-i/2^a(/iO, 

\i=l / i=l 

by the Cauchy-Schwarz inequality, while 

M 

W{A) = max > M'^ ^ W^(/iz). 

Consequently, (A.13) entails that 

J Ad(F„-F) <Y. J h^di¥n-F) 

i—1 

CM M \ 

Y^aih,)pi/'+Y^W{h.)pl/'] 
i=l i=l ) 

< 4C7(a(A)mi/2py2 ^ w{A)mpl/^) 

uniformly in m g N and A g with probability tending to one as n — > c». 

It remains to verify (A.13). To this end, we use a bracketing argument. With the 
weight function w{x) = max(l, |cp(a;)|), let — oo = t„^o < t„.i < ■ ■ ■ < i„,Ar(,i) = oo such 
that for ■■= (i„j_i,i„j], 

{2ny^ < w{x)^f{x)dx<n~'^ for 1 < j < A^(n), 



with equality if j < N(n). Since 1 < / exp(tow(a;))/(x) dx < oo, such a partition exists 
with N{n) = 0{n). For any h we define functions hn^i, hn^u as follows. Let {j, . . . ,k} 
be the set of all indices i£ {1,. . ., N{n)} such that {/i > 0} n In,i ^ 0. We then define 
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and 

hn,u{x) := hn,i{x) + l{x G /„j U In,k}W{h)w{x). 

Note that < < h < < W{h)'w. Consequently, W{K,i) < W{h) = W/(/i„,„). 
Suppose, for the moment, that the assertion is true for the (stiU infinite) family 
Hn ■= {hn,e, hn^u ■ h e H} in place of H. It then follows from u; > 1 that 



//>d(«'..-F)</.„dF..-/„,,,di^ 

hn,u d(F„ -F)+ {hn,u - K,i) dF 



< K,udi¥„-F) + W{h) / w{xfdF 

< j Vnd(F„-i^) + 2M^(/i)n-i 

< C{aih)pl/^ + 2i/2^(/j)n-i/2pi/2 + ^2/3) 2Wih)n-' 
<iC + omaih)pl/'+Wih)pl/'), 

uniformly in ft, G 7i with asymptotic probability one. Analogously, 

hd{¥„~F)> J Kjdi¥n^F)-2Wih)n-^ 

> -^Cia{Kj)pU' + W{h)pl!^) - 2W{h)n-^ 
>-{C + o{l)){a{h)p]/^+W{h)p'J^), 

uniformly in /i G 7i with asymptotic probability one. 

To accord with Lemma A. 8, we must now deal with 7i„. For any ft, G 7i, the function 
hn^i may be written as 



with the "triangular functions" 



(1) / N tn,k-l — X 



and 



9^nik i^) , for l<J<k< N{n) ,k^j>2. 
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In case of A; — j < 1, we set g^^j j, := g^^^- := 0. Moreover, 

+ W{h)gn,j + l{fc > j}W{h)gn,k: 

with gn.iix) :~ l{x G In^i}w{x). Consequently, all functions in 7i„ are linear combinations 
with non-negative coefficients of at most four functions in the finite family 

Gn := {gn. : 1 < * < N{n)} U {^ilfc-sS.fc : 1 < J < ^ < N{n)}. 

Since C/„ contains 0{n^) functions, it follows from Lemma A. 8 that for some constant 
D>0, 



y".gd(F„-F) 



<D{a{g)pl,/' + W{g)pT) 

for all g € Qn with asymptotic probability one. The assertion about 7i„ now follows 
from the basic obs 
coefficients ai>0, 



from the basic observation that for h = '^idi with non-negative functions gi and 



/ 4 \ 1/2 4 

a{h) > Y.a'^a{g,f > a,a(g,). 



■i=i 

4 



W^(/i)> max a,W{g,)>A"^Ya,W{g,). 

1=1,. ..,4 — ' 



□ 



A. 4. Proofs for the gap problem and of -F„'s consistency 

Proof of Theorem 4.3. Suppose that is linear on an interval [a, 6]. Then, for x G 
[a,b] and A^^ := {x - a)/{h- a) G [0, 1], 

ip{x) - (1 - \x)v{a) - \xV{b) 

= (1 - \x){ip{x) - Lp{a)) - \x{^p{b) - ip{x)) 

nx nb 

= (1-A,) / ^\t)dt-Xx / v'{t)dt 

J a J X 

^{l-K) I {v'{t)~^'{x))dt + Xx [ {^'{x)^^'{t))dt 

J a J X 

>C{1-Xx)f {x~t)dt + C\x j {t~x)dt 

J a J X 

= C(6-a)2A,(l-A,)/2 

= C{b~af/S iix = Xo.= {a + h)/2. 
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This entails that supj^ ^ \(pn — >C{b-- a)^/16. For ii ifn < ip + C{b — a)^/16 on {a, b}, 
then 

ip{Xo) - ipn{Xo) = ifiiXo) - iV'nia) + (^„(6))/2 

> ifiixo) - {(fia) + f{b))/2 - C{b - af/lQ 

> C{b - a) V8 - C{b - af/W = C{b - af /IQ. 

Consequently, if - ^| < D^pi'^^^+^^ on T„ := [A + p]J''^''+^\B - pH'^^^^^^] with 
Dn = Op(l), then the longest subinterval of T„ containing no points from Sn has length 
at most ADl''^C-^/'^pi''''^^^'^\ Since r„ and T = [A,B] differ by two intervals of length 
^1/(2/3+1) _ Q^^/3/(4/3+2)^^ these considerations yield the assertion about □ 

Proof of Theorem 4.4. Let (5„ := p^^^^^^^ and r„ := = DSl'"^ for some 

constant D > Q. Since r„ but nr„ — s- oo, it follows from boundedness of / and a 
theorem of Stute (1982) about the modulus of continuity of univariate empirical processes 
that 

w„:= sup |(F„-F)(a;)-(F„-F)(y)| 
= Op(n-V2^i/2iog(i/^„)i/2) 

- Op(p(5^+2)/(8/3+4))^ 

If I? is sufficiently large, the asymptotic probability that for any point a; G + (5„ , _B — (5„] , 
there exists a point y G H [A + 5„, B — 5„] with |x — y| < r„, is equal to one. In 

that case, it follows from Corollary 2.5 and Theorem 4.1 that 

- F„)(x)| < |(F„ - F„)(x) - {F,, - F„)(2/)| + n"! 

< |(F„-^^)(x)-(F„-^^)(y)|+^„+7i-i 

/'max(j;,'y) 

< / \fn- f\{x)dx+LU.n+n-^ 
J min(a;,y) 

-Op(pf/(4^+2))_ □ 
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